home *** CD-ROM | disk | FTP | other *** search
- /xlv1/freeware/1998.May/findutils/4.1/findutils-4.1.diffbuild/locate
-
-
-
- LLLLOOOOCCCCAAAATTTTEEEEDDDDBBBB((((5555LLLL)))) UUUUNNNNIIIIXXXX SSSSyyyysssstttteeeemmmm VVVV LLLLOOOOCCCCAAAATTTTEEEEDDDDBBBB((((5555LLLL))))
-
-
-
- NNNNAAAAMMMMEEEE
- locatedb - front-compressed file name database
-
- DDDDEEEESSSSCCCCRRRRIIIIPPPPTTTTIIIIOOOONNNN
- This manual page documents the format of file name databases
- for the GNU version of llllooooccccaaaatttteeee. The file name databases
- contain lists of files that were in particular directory
- trees when the databases were last updated.
-
- There can be multiple databases. Users can select which
- databases llllooooccccaaaatttteeee searches using an environment variable or
- command line option; see llllooooccccaaaatttteeee(1L). The system
- administrator can choose the file name of the default
- database, the frequency with which the databases are
- updated, and the directories for which they contain entries.
- Normally, file name databases are updated by running the
- uuuuppppddddaaaatttteeeeddddbbbb program periodically, typically nightly; see
- uuuuppppddddaaaatttteeeeddddbbbb(1L).
-
- uuuuppppddddaaaatttteeeeddddbbbb runs a program called ffffrrrrccccooooddddeeee to compress the list
- of file names using front-compression, which reduces the
- database size by a factor of 4 to 5. Front-compression
- (also known as incremental encoding) works as follows.
-
- The database entries are a sorted list (case-insensitively,
- for users' convenience). Since the list is sorted, each
- entry is likely to share a prefix (initial string) with the
- previous entry. Each database entry begins with an offset-
- differential count byte, which is the additional number of
- characters of prefix of the preceding entry to use beyond
- the number that the preceding entry is using of its
- predecessor. (The counts can be negative.) Following the
- count is a null-terminated ASCII remainder - the part of the
- name that follows the shared prefix.
-
- If the offset-differential count is larger than can be
- stored in a byte (+/-127), the byte has the value 0x80 and
- the count follows in a 2-byte word, with the high byte first
- (network byte order).
-
- Every database begins with a dummy entry for a file called
- `LOCATE02', which llllooooccccaaaatttteeee checks for to ensure that the
- database file has the correct format; it ignores the entry
- in doing the search.
-
- Databases can not be concatenated together, even if the
- first (dummy) entry is trimmed from all but the first
- database. This is because the offset-differential count in
- the first entry of the second and following databases will
- be wrong.
-
- There is also an old database format, used by Unix llllooooccccaaaatttteeee
-
-
-
- Page 1 (printed 5/18/98)
-
-
-
-
-
-
- LLLLOOOOCCCCAAAATTTTEEEEDDDDBBBB((((5555LLLL)))) UUUUNNNNIIIIXXXX SSSSyyyysssstttteeeemmmm VVVV LLLLOOOOCCCCAAAATTTTEEEEDDDDBBBB((((5555LLLL))))
-
-
-
- and ffffiiiinnnndddd programs and earlier releases of the GNU ones.
- uuuuppppddddaaaatttteeeeddddbbbb runs programs called bbbbiiiiggggrrrraaaammmm and ccccooooddddeeee to produce
- old-format databases. The old format differs from the above
- description in the following ways. Instead of each entry
- starting with an offset-differential count byte and ending
- with a null, byte values from 0 through 28 indicate offset-
- differential counts from -14 through 14. The byte value
- indicating that a long offset-differential count follows is
- 0x1e (30), not 0x80. The long counts are stored in host
- byte order, which is not necessarily network byte order, and
- host integer word size, which is usually 4 bytes. They also
- represent a count 14 less than their value. The database
- lines have no termination byte; the start of the next line
- is indicated by its first byte having a value <= 30.
-
- In addition, instead of starting with a dummy entry, the old
- database format starts with a 256 byte table containing the
- 128 most common bigrams in the file list. A bigram is a
- pair of adjacent bytes. Bytes in the database that have the
- high bit set are indexes (with the high bit cleared) into
- the bigram table. The bigram and offset-differential count
- coding makes these databases 20-25% smaller than the new
- format, but makes them not 8-bit clean. Any byte in a file
- name that is in the ranges used for the special codes is
- replaced in the database by a question mark, which not
- coincidentally is the shell wildcard to match a single
- character.
-
- EEEEXXXXAAAAMMMMPPPPLLLLEEEE
- Input to ffffrrrrccccooooddddeeee:
- /usr/src
- /usr/src/cmd/aardvark.c
- /usr/src/cmd/armadillo.c
- /usr/tmp/zoo
-
- Length of the longest prefix of the preceding entry to share:
- 0 /usr/src
- 8 /cmd/aardvark.c
- 14 rmadillo.c
- 5 tmp/zoo
-
- Output from ffffrrrrccccooooddddeeee, with trailing nulls changed to newlines
- and count bytes made printable:
- 0 LOCATE02
- 0 /usr/src
- 8 /cmd/aardvark.c
- 6 rmadillo.c
- -9 tmp/zoo
-
- (6 = 14 - 8, and -9 = 5 - 14)
-
- SSSSEEEEEEEE AAAALLLLSSSSOOOO
-
-
-
- PPPPaaaaggggeeee 2222 ((((pppprrrriiiinnnntttteeeedddd 5555////11118888////99998888))))
-
-
-
-
-
-
- LLLLOOOOCCCCAAAATTTTEEEEDDDDBBBB((((5555LLLL)))) UUUUNNNNIIIIXXXX SSSSyyyysssstttteeeemmmm VVVV LLLLOOOOCCCCAAAATTTTEEEEDDDDBBBB((((5555LLLL))))
-
-
-
- ffffiiiinnnndddd(1L), llllooooccccaaaatttteeee(1L), llllooooccccaaaatttteeeeddddbbbb(5L), xxxxaaaarrrrggggssss(1L) FFFFiiiinnnnddddiiiinnnngggg FFFFiiiilllleeeessss
- (on-line in Info, or printed)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Page 3 (printed 5/18/98)
-
-
-
-